31 research outputs found

    Brook GLES Pi: democratising accelerator programming

    Get PDF
    Nowadays computing is heavily-based on accelerators, however, the cost of the hardware equipment prevents equal access to heterogeneous programming. In this work we present Brook GLES Pi, a port of the accelerator programming language Brook. Our solution, primarily focused on the educational platform Raspberry Pi, allows to teach, experiment and take advantage of heterogeneous programming on any low-cost embedded device featuring an OpenGL ES 2 GPU, democratising access to accelerator programming.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    BaseSAFE: Baseband SAnitized Fuzzing through Emulation

    Full text link
    Rogue base stations are an effective attack vector. Cellular basebands represent a critical part of the smartphone's security: they parse large amounts of data even before authentication. They can, therefore, grant an attacker a very stealthy way to gather information about calls placed and even to escalate to the main operating system, over-the-air. In this paper, we discuss a novel cellular fuzzing framework that aims to help security researchers find critical bugs in cellular basebands and similar embedded systems. BaseSAFE allows partial rehosting of cellular basebands for fast instrumented fuzzing off-device, even for closed-source firmware blobs. BaseSAFE's sanitizing drop-in allocator, enables spotting heap-based buffer-overflows quickly. Using our proof-of-concept harness, we fuzzed various parsers of the Nucleus RTOS-based MediaTek cellular baseband that are accessible from rogue base stations. The emulator instrumentation is highly optimized, reaching hundreds of executions per second on each core for our complex test case, around 15k test-cases per second in total. Furthermore, we discuss attack vectors for baseband modems. To the best of our knowledge, this is the first use of emulation-based fuzzing for security testing of commercial cellular basebands. Most of the tooling and approaches of BaseSAFE are also applicable for other low-level kernels and firmware. Using BaseSAFE, we were able to find memory corruptions including heap out-of-bounds writes using our proof-of-concept fuzzing harness in the MediaTek cellular baseband. BaseSAFE, the harness, and a large collection of LTE signaling message test cases will be released open-source upon publication of this paper

    Fast and portable vector dsp simulation through automatic vectorization

    Get PDF
    \u3cp\u3eVector DSPs are quite common in embedded SoCs used in compute-intensive domains such as imaging and wireless communication. To achieve short time-to-market, it is crucial to provide system architects and SW developers with fast and accurate instruction set simulators of such DSPs. To this end, a methodology for accelerating the simulation of vector instructions in vector DSPs is proposed. The acceleration is achieved by enabling automatic translation of the vector instructions in a given vector DSP binary into host SIMD instructions. The key advantage of the proposed methodology is its independence from the host architecture. Empirical evaluation, using a set of commercial vector DSPs, shows that the proposed methodology provides a 4x average reduction in simulation time of a vector instruction and a 2x average reduction in simulation time of a whole application.\u3c/p\u3

    Harissa: A flexible and efficient Java environment mixing bytecode and compiled code

    No full text
    The Java language provides a promising solution to the design of safe programs, with an application spectrum ranging from Web services to operating system components. The well-known tradeoff of Java's portability is the inefficiency of its basic execution model, which relies on the interpretation of an object-based virtual machine. Many solutions have been proposed to overcome this problem, such as just-in-time (JIT) and off-line bytecode compilers. However, most compilers trade efficiency for either portability or the ability to dynamically load bytecode. In this paper, we present an approach which reconciles portability and efficiency, and preserves the ability to dynamically load bytecode. We have designed and implemented an efficient environment for the execution of Java programs, named Harissa. Harissa permits the mixing of compiled and interpreted methods. Harissa's compiler translates Java bytecode to C, incorporating aggressive optimizations such as virtual method call optimization based on the Clas

    Arbitrary and Variable Precision Floating-Point Arithmetic Support in Dynamic Binary Translation

    No full text
    International audienceFloating-point hardware support has more or less been settled 35 years ago by the adoption of the IEEE 754 standard. However, many scientific applications require higher accuracy than what can be represented on 64 bits, and to that end make use of dedicated arbitrary precision software libraries. To reach a good performance/accuracy trade-off, developers use variable precision, requiring e.g. more accuracy as the computation progresses. Hardware accelerators for this kind of computations do not exist yet, and independently of the actual quality of the underlying arithmetic computations, defining the right instruction set architecture, memory representations, etc, for them is a challenging task. We investigate in this paper the support for arbitrary and variable precision arithmetic in a dynamic binary translator, to help gain an insight of what such an accelerator could provide as an interface to compilers, and thus programmers. We detail our design and present an implementation in QEMU using the MPRF library for the RISCV processo

    Improving Remote Desktopping Through Adaptive Record/Replay

    No full text

    A fast analytical model of fully associative caches

    No full text
    While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.Comment: 14 pages, 16 figures, PLDI1
    corecore